Imaging and seismic modelling inside volcanoes using machine learning

Despite advances in seismology and computing, the ability to image subsurface volcanic environments is poor, limiting our understanding of the overall workings of volcanic systems. This is related to substantive structural heterogeneities which strongly scatters seismic waves obscuring the ballistic arrivals normally used in seismology for wave velocity determination. Here we address this constraint by, using a deep learning approach, a Fourier neural operator (FNO), to model and invert seismic signals in volcanic settings. The FNO is trained using 40,000+ simulations of elastic wave propagation through complex volcano models, and includes the full scattered wavefield. Once trained, the forward network is used to predict elastic wave propagation and is shown to accurately reproduce the seismic wavefield. The FNO is also trained to predict heterogeneous velocity models given a limited set of input seismograms. It is shown to capture details of the complex velocity structure that lie far outside the ability of current methods available in volcano imagery.

www.nature.com/scientificreports/ form of neural networks both to model and to invert complex volcano seismic signals. An alternative to the traditional geophysical inversion techniques is to consider the data inversion task as an optimisation problem and deploy recent advances in machine learning. The central idea is to use the entire scattered wavefield to address the imaging problem-not merely events that are coherently reflected off spatially correlatable geological structures. The accuracy of these machine learning techniques, specifically deep neural networks, has dramatically increased across nearly every field of research 20 . Recent advances in machine learning can be attributed to the increase in computer power and the adoption of deep-learning methods 21 . In geophysics, machine learning has been long established 22,23 , however, a suite of recent applications has appeared in the literature. These include seismic processing, automatic fault detection, noise suppression, micro-seismic detection, as well as diffraction identification [24][25][26][27][28][29] . Here we deploy a special type of neural network called a Fourier Neural Operator (FNO) which has been proposed to predict the results from a partial differential equation (operator) given an input model and initial conditions 30 . In our case the network predicts the Green's functions for a specific equation and hence the reverse network is capable of generating the inverse Green's functions which is the kernel for velocity inversion. This study assesses the problem from a theoretical viewpoint focusing on the question: can neural networks accurately model and invert seismic wavefields in a complex volcanic environment?

Methods
Fourier neural networks. A partial differential equation (PDE) can be viewed as a differential operator L mapping a function g to the function f, i.e. Lg = f. This is the forward problem where the operator L in this case is the elastic wave equation. The inverse case is where the inverse operator L −1 maps the function f to the function g. We can automatically link the operator equation with the equation Gm = d where the elastodynamic Green's functions G for a given Earth model m produce the seismic data d and the inverse relation G −1 .d = m generates an Earth model m given the data and the inverse Green's functions, G −1 . In our specific case, the application of L determines G for a specific model m. Neural networks are a mapping from an input vector space to an output vector space or an input image vector to an output image vector which is an equivalent representation of the differential operator above and, in theory, can learn to predict the operator or inverse operator given appropriate input training data. The FNO network was the first network to model turbulent flows and was used to predict results from the Navier-Stokes equation, Burger's equation and Darcy's equations 28 . Recently, it has been used to predict two phase flow in the subsurface given a subsurface model and initial conditions 31 . The FNO has some distinct advantages over other networks such as convolution neural networks (CNN) that make it readily adaptable to learning differential operators. The method is resolution and mesh invariant hence it can take a low resolution input training dataset and predict higher resolution outputs on arbitrary meshes. The convolution in the CNN is replaced by a multiplication in the Fourier domain and this has some advantages. The first is the compute complexity of a convolution versus the fast Fourier transform, making the FNO more computationally efficient. PDEs are continuous and the global nature of the Fourier space is more suited to capturing this continuity versus the local convolution kernels in a CNN. Additionally, higher frequencies can be truncated without significantly impacting the solution, which is in essence a compression routine, increasing the compute efficiency. In practical terms for this work, large volumes of synthetic seismic data are generated using a partial differential solver given an input model. This can be considered as the 'observed' data. The FNO is trained using this dataset for both the forward and inverse problem. When subsequently applying the trained FNO for the forward problem, the network takes the models as input and predicts the expected elastic wavefields and synthetic seismograms. When applying the FNO for the inversion network, the input is the 'observed' seismograms from a previously unseen simulation in a specific earth model of interest (equivalent to observed data), and the prediction is that specific velocity model. The FNO prediction output is orders of magnitude faster than running a PDE solver, but the generation of the training data and the FNO training itself does incur a significant upfront computational cost, dependent on the problem.

Synthetic data generation ('the observations').
By definition there is no physical dataset which can be used to train the FNO system to invert seismograms for a known subsurface velocity model. Supervised FNO training requires solutions to the wave-equation as inputs. Therefore we need to generate training datasets using known input velocity models, source locations and source mechanisms and use a forward wave equation solver to output training sets comprising seismic wavefields and synthetic seismograms. Given the heterogeneous nature of volcanic materials we need to choose a solver that can propagate seismic elastic waves in materials with broadband heterogeneity with strong velocity gradients. Here we chose a 2D elastic lattice method 32 as a full wavefield simulator. We generate two different training datasets and one independent test dataset. It is stressed that the test data are never used in the training of the networks. The generation of synthetic datasets requires several choices to be made which constrain the solution as without such constraints, the parameter space is too large for practical implementation. The assumptions made in the generation of the datasets are detailed below.
2D models. Although there is a recent example in acoustic wave applications 33 , we are not aware of previous work which attempts to use the FNO to learn the elastic wave equation Green's function and its inverse, for a complex model. In order to explore the feasibility of this approach in complex media, we focused on 2D examples which reduce the computational demand and allows us to constrain the approximate cost and challenges associated with producing reasonable results. Future moves to 3D will increase the computational cost in generating the synthetic datasets and in training the FNO along with the need to use a larger neural network.
Source. The source locations were chosen at random anywhere inside the 2D model but the depth was constrained to be less than 6 km. The source mechanism was the moment tensor (m xx , m zz , m xz ) where component www.nature.com/scientificreports/ values were randomly chosen between 0 and 1 for the diagonal components and 0 and 0.5 for m xz . The source wavelet for all simulations was a Ricker 4 Hz wavelet. This was chosen as it overlaps with the long period volcanic signals typically observed in such settings. The higher the frequency, the higher the spatial resolution required and hence the computational cost grows for both the simulations and the FNO training.
Training velocity models. As Neural Networks are poor at extrapolation, for any real world application the choice of training set is a key consideration. The dominant characteristics of the expected physical structure must be included, in a general framework, in the training set. Here we utilise two different velocity model populations, with different statistical distributions to generate synthetics for FNO network training. Due to their formation history the mechanical complexity of volcano edifices means that a deterministic calculation of their fine scale structure is out of reach, and a statistical approach is more appropriate. Here the topography was fixed for all model populations as a Gaussian hill, as topography is one of the only parameters that can be accurately measured in real environments. The first velocity model population, denoted as population L, has a p-wave velocity gradient randomly chosen within the bounds of a minimum velocity of 1500 m/s and a maximum of 3500 m/s. Four layers, tracking the topography, were superimposed on top of this gradient, where the thickness of the layers were randomly chosen and velocities randomly perturbed from the gradient value. The s-wave velocity was set to two thirds the p-wave value. In the second model population, the same velocity gradient was used but the velocity field was randomly perturbed with a normal distribution and smoothed to yield a tomography-like velocity. This population is denoted as population T. Both populations L and T are each composed of 20,000 individual velocity models used to generate the synthetic seismograms for training the FNO networks and a further 500 velocity models for testing the networks. The test models in the populations are not used in training. Two sample models, drawn from the populations L and T, are shown in Fig. 1. By restricting the study to these two populations, we expect the network to be only able to learn how to generate wavefields from models drawn from these distributions or return velocity models within these distributions. In a real world scenario, the solution to the problem of constraining an input velocity distribution could be to take known geological constraints on the possible structures inside a particular volcano and build a large distribution of representative geological velocity models with the appropriate topography. This is the approach that will be taken for a full 3D inversion of real data in future studies.
Validation velocity models. Another population of models was created to validate the approach but was never used to train any network. This population, denoted population V, is similar to population T but without the smoothening and with a higher magnitude velocity perturbation. www.nature.com/scientificreports/ Simulation parameters. The model size was 20 km in length and 10 km deep with a spatial sampling of 10 m. The output temporal resolution of the synthetic seismograms was 1 ms. Output seismograms are 5 s long. Absorbing boundaries were imposed on the side and bottom to avoid unrealistic reflections. Each individual simulation for one model outputs the entire wavefield, both the x-and z-components of the displacement every 0.1 s along with synthetic seismograms. Figure 1, panels (A) and (B) shows one snapshot of the wavefield overlain on the velocity model. The location of the stations are shown in Fig. 1 as triangles on the surface of the model volcano. These locations were randomly selected sensors from the full synthetic sensor network of 40 sensors disturbed evenly across the full model. We randomly selected 10 sensors from 40 as having a dense full azimuthal coverage is rarely the case in physical world volcano settings. Adding more sensors, adds more information and increases the training computational requirements but the additional information can help increase the accuracy of the network. Figure 2 shows the synthetic seismograms from all 40 sensors for both models simulated using the elastic wave equation solver.
Training. Each sample from model populations L and T with a random source location and a random source mechanism is used as inputs to the 2D elastic wave simulator and the resultant synthetic seismograms and wavefields are outputted and saved. Therefore, the training datasets consist of the velocity models from the populations L and T, the source locations and the synthetic wavefields and seismograms. The training datasets comprise a total of 40,000 numerical seismic wavefield forward propagator simulations (20 k for each model population) and the model populations L and T. These training datasets are used to teach the several different FNO networks presented in the results. We will denote the training datasets using the same terminology as the L and T model populations. In order to train the FNO to act as a fast forward wavefield propagator (i.e. to output wavefields and seismograms), it was trained using the source locations, the source mechanisms and the velocity models as inputs. To train the FNO to act as an inverse modelling tool (i.e. to output a velocity model) the training datasets have the inputs and outputs switched and the networked mirrored. The results for four networks are presented in this work. The first two are the forward modelling networks for both training datasets L and T. The second pair are for the inversion networks for both the training datasets L and T. The networks are four layers deep and built using PyTorch 34 (version 1.9.0). For both inverse and forward networks, as discussed above, the training data consists of 20,000 samples with a batch size of 20 run over 200 epochs. A variable learning rate was imposed with a 50% reduction after 25 epochs. The input data were normalised with a Gaussian normalisation function and the optimiser was the AdamW algorithm though the choice did not have a significant impact on the results. These parameters were found by trial and error on initial tests balancing the amount of data required to provide www.nature.com/scientificreports/ reasonable results against the computational time. Figure 3 shows the normalised training error for training the layered inversion network. This is typical of all the training runs. The test data which consists of 500 samples, was not used in the training. The main parameters controlling the training were the size of the training samples, the complexity in the velocity models and the number of epochs. Initially, tests on a homogenous model distribution (not shown here) required at least 5000 training samples before a reasonable training error was attained after 100 epochs. For the more complex velocity model populations T and L, the number of samples needed to be increased towards 20,000. A test with 40,000 sample simulations for population L did not significantly improve the results over the 20,000 samples and started to show the typical signs of overfitting in Neural Networks. The accuracy of the results on the test data is discussed in the "Results" section.

Results
Forward modelling results. Before we consider the application of the FNO to the classical geophysical inversion problem of resolving an accurate velocity model given seismic data, we will demonstrate the ability of the FNO to simulate wavefields and seismograms, i.e. the forward problem. The inputs are the velocity models, the source location and the source mechanism. The outputs are the seismic wavefields and/or the seismograms. The seismograms are a spatial subset of the total wavefields, captured at the model top surface. The output wavefield for two different time periods produced by using the elastic equation solver is shown in the top panels in Fig. 4. The model used in this case is a single example from test data from population L and when input into the appropriate FNO network predicts the output wavefield, middle panels. The comparison between the two show that the FNO system does an excellent job at replicating the main features of the wavefield. This is confirmed by the bottom panel plots where the difference between the snapshots is presented. The accuracy is not at the level one would expect when comparing two different explicit solvers to the elastic wave equation, but the dominant high amplitude phases in the wavefield are captured with the correct amplitudes. The FNO prediction equally resolves the main p-wave and s-wave phases as seen in the 3 s snapshots. The results for network trained using the training dataset T are similar. As the full wavefield can never be physically recorded in practice, a more useful approach would be to focus on the seismograms from known locations. This has the added benefit of decreasing the computational cost as the input data volumes are smaller, as are the output layers in the network. Figure 5 shows the results, where instead of using the entire wavefields to train the network, we only use the surface seismogram data, in this case all 40 sensors. The top panel in Fig. 5 shows the simulated outputs from the direct solver and the middle panel shows the prediction from the FNO trained with population L; the bottom panel shows the difference between the top two panels. Again the FNO demonstrates an excellent ability to capture the main features of the wavefield when employed as a forward propagator. This is despite the limited coverage of merely surface seismic records as input data for training. This bodes well for real world applications, where usually only surface seismograms are available. The difference between the two do show some edge artefacts in the neural network output along with small scale misfits in the small amplitude scattering events. Several reversible normalisation methods, including time and log normalisation, were tested to determine if these smaller amplitude effects could be better predicted but with limited additional success. The amount of training samples does play an important role here but as discussed in the training section, the training and test error plateaus out after 20,000 training samples. Obviously the more complex the velocity model the larger the scattering which will diminish the ability of the FNO to predict all of the details in the seismograms. However, for the low frequency components of volcanic signals, the wavelengths can be on the order of 100's of metres hence these smallest scale heterogeneities will 'smooth out' . This is important to consider when interpreting the inversion results in the next section as the main amplitude phases which are accurately predicted can contain enough information to reproduce the velocity model. www.nature.com/scientificreports/ Inversion results (velocity model recovery from seismic data). For the velocity inversion problem, the training was performed in exactly the same way as described for the forward problem and in the training section, except for two changes. The first difference was to reverse the inputs and outputs and adjust the layers to match the input sizes, i.e. mirror the network. The second difference was that when training the inversion FNO network the inputs were 10 randomly located seismograms, from the 40 described above, for both the x-component and z-component of seismic displacement. In all cases, white noise amplitudes of 15% were added to the input seismic traces. Inputs also included the known velocity model for each seismogram set and the source location. Following training, during the application of the inversion FNO the input is the set of 10 twocomponent seismograms (i.e. two individual shot gathers) from a previously unseen model. The FNO output is the predicted velocity model and the source location. In Fig. 6, examples of an input model from populations L and T are shown to illustrate the results. These are representative of all results, and all results shown use samples from the different populations that were not used in training the networks. The left-hand panels show the true models with the seismic source location highlighted by the asterisks. The output results from the FNO networks is shown in the middle panels, i.e. the network velocity model and source location prediction. The right-hand panels show a profile through the models comparing the true model with the predicted FNO result. It is clear from Fig. 6 that the FNO method is capable of accurately predicting a velocity model where the input is unseen surface seismograms generated in the unseen model (taken from populations L or T). In order to quantify the results, the normalised RMS misfit between the true model inputs and the FNO generated outputs are shown for 20 different inversions using unseen samples from both populations L and T. The source location misfit is the difference between the (x, z) locations for true value and output value. The velocity normalised RMS is the sum over all velocity differences between the true and predicted model for every grid point in the model. The RMS values are shown in Fig. 7 for both populations L and T. The error in the velocity model is relatively small, and as illustrated in the velocity field comparisons and profiles in Fig. 6, the velocity model prediction is very good and is also relatively robust to the addition of noise in the training dataset up to the maximum value of 15%. The source location is also very good but more sensitive to the addition of noise. Several different normalisation strategies were explored to understand and improve the source error. The results pointing to higher accuracy for the velocity model and increased error for the location appear robust. In practise, this is less of a concern as traditional methods to determine the source location are well known and accurate even www.nature.com/scientificreports/ in the presence of a long wavelength velocity structure. As discussed in the previous section, the FNO network was used to predict the output from the elastic wave equation, in essence learning how to reproduce the Green's function for such systems. The inverse network can therefore be considered to have encoded the inverse Green's function for the wave equation and therefore allows for the prediction of an accurate velocity model following the analogous equation The examples above show the results where input seismic data presented to the FNO were calculated in previously unseen velocity models with the same statistical properties as the velocity models used to generate the FNOs training seismic data. The input was a single shot gather with a random source location, in each case. We now show a final example where we use seismograms generated using a simulation in velocity models from . Right panel, profiles through the true and predicted models. The seismic source location is highlighted by the asterisks. In the FNO output, the source location is also predicted by the network. Both velocity models here are each recovered from input data associated with only one randomly located shot into 10 randomly located surface receivers, each. www.nature.com/scientificreports/ population V as the input data to a FNO network that was trained using seismic data from velocity models drawn from population T. The prediction is from a single set of seismograms from a single source and one velocity model sample from population V, i.e. a shot gather. The top panel in Fig. 8 shows that this does not accurately reproduce the known velocity model. This is not a surprise as Neural Networks can struggle to generalisation across different training populations. However Fig. 8 does show that some features are recovered so we opt to stack the predictions from multiple shot gathers (random source locations, and their seismograms), in the same velocity model. This is shown in the middle and bottom panels in Fig. 8 where we stack velocity model predictions from 20 and 100 shot gathers respectively. The convergence to the true model is clearly demonstrated showing that even with a velocity model population that differs in statistical detail from the training populations, an accurate velocity model can be recovered through predicted velocity model stacking. Despite this success, it should be stressed that the appropriate choice of statistical parameters of the model populations using in FNO training will play a dominate role in the ability of networks to converge to a solution. It still is an open question as to how to determine the statistical properties of the training dataset to maximise the ability of the network to converge to a general solution or even if this will be possible given the extreme heterogeneity in velocity structure in volcanic environments in real world scenarios. We propose that a priori information such as topography, the population of rock physical properties, borehole petrophysical log information etc. could effectively be used to geologically constrain the 'bandwidth' of training velocity model populations, for a given volcanic environment. Overall, the results for both the inversion and forward problem highlight the applicability of the FNO network to solve a highly non-linear inversion problem in elastically heterogeneous environments, and to replicate the seismic wavefield for rapid forward calculations.

Discussion
The FNO can take unseen synthetic data and discover the seismic velocity model and seismic source that would generate this dataset, if using a traditional PDE seismic wavefield forward solver. It can also be used to simulate synthetic data given an input model that is accurate when compared with a traditional PDE solver, except for the low amplitude scattering. The objective is not to replace traditional solvers as they are required for the generation of the training data. As seismic imaging on volcanoes is still a huge challenge we suggest that the inverse www.nature.com/scientificreports/ problem is of more interest, as this may allow access to the true velocity structure when the input to the Neural Network is real data. A challenge with using real data as inputs into these networks may centre on sufficient environmental noise samples in network training, but fortunately this can be captured statistically 35 . A more challenging question is how to build a population of numerical training models that sufficiently represent the physical world? The requirement is that such a numerically trained FNO will return an accurate representation of real world velocity structure, when presented with appropriate real field seismic data. Here, in this work, we demonstrate that if that training model population building challenge can be overcome, then the FNO has a high probably of accurately inverting for real world velocity structure. The details of how this might be achieved will be the focus of future work, but it will likely require the building of hybrid statistical-deterministic models, based on high spatial resolution mapping. The computational scaling to 3D remains a significant challenge, not in the prediction, but in the generation of a significant number of input datasets. Whilst this will require large resources, the resolution invariance of the FNO means lower resolution training data can be used and still provide higher resolution predictions. This theoretical study has demonstrated the viability of deploying machine learning tools as a fresh approach to the arduous task of deriving high fidelity velocity models in volcanic environments. If such models can be obtained they will, for example, increase our understanding of how seismic sources relate to the specifics of how magma is emplaced. It would also allow local volcano observatory based real time tracking of magma migration through repeated imagery with ease, as the forward FNO is fast and computationally efficient. Such a routine capability could radically impact risk mitigation on volcanoes.

Data availability
All the data used were generated by the computer codes discussed in the text. The datasets generated and/or analysed during the current study are not publicly available due to the size of the datasets but can be recomputed following the method outlined in the manuscript.